Chris Pollett > Old Classes > CS256
( Print View )

Student Corner:
  [Submit]
  [Grades Sec1]

  [Lecture Notes]

  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HWs and Quizzes:
  [Hw1] [Hw2] [Hw3]
  [Hw4] [Hw5] [Quizzes]

Practice Exams:
  [Mid 1] [Mid 2] [Final]

CS256 Fall 2017Practice Final

To study for the final I would suggest you: (1) Know how to do (by heart) all the practice problems. (2) Go over your notes at least three times. Second and third time try to see how much you can remember from the first time. (3) Go over the homework problems. (4) Try to create your own problems similar to the ones I have given and solve them. (5) Skim the relevant sections from the book. (6) If you want to study in groups, at this point you are ready to quiz each other. The practice final is below. Here are some facts about the actual final: (a) It is comprehensive (b) It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test. (c) You should bring photo ID. (d) There will be more than one version of the test. Each version will be of comparable difficulty. (e) It is 10 problems (2pts each), 6 problems will be on materials since the second midterm, two problems will come from the topics for each midterm (four problems total). (f) Two problems will be exactly (less typos) off of the practice final, and one will be off of each practice midterm.

Briefly define each of the following Noise robustness concepts: Inputs Perturbation, Outputs Perturbation, Weights Perturbation.
Describe the basic, semi-supervised generative/discriminitive learning algorithm of Ullusoy Bishop 2005 presented in class.
Suppose we have `5` regression models, the `i`th making error with variance `E[(epsilon_i)^2]=.2` on mean 0 drawn according to a normal distribution. Suppose the covariances are all `E(epsilon_i cdot epsilon_k)=.5`. What is the expected squared error of the bagging predictor?
With regard to neural net regularization, what is dropout? Explain how it works.
Give the RMSProp adaptive learning rate algorithm.
Describe the layers in the LeNet-5 neural net architecture.
Give an recurrent NN architecture in which teacher forcing training can be used, and one where it can't. Explain your answer.
Explain with reference to the actual internal gate computations, how LSTMs might solve the Long Term Dependency problem.
What were the steps in Ng's NN design methodology?
What is n-gram smoothing? What is a higher order-lower order back-off modeling?